An integrated system for processing information from genealogical text

نویسندگان

  • Merrill Hutchison
  • Tim Richards
  • William Taysom
  • Deryle Lonsdale
چکیده

First, we survey the nonstandard or exaggerated linguistic characteristics that Englishlanguage genealogical text (and indeed that of other languages) often exhibits. For example, in English genealogical prose frequent repetition of subject pronouns is avoided---they are simply dropped, though this would usually be considered ungrammatical except in diaries. Also, genealogical text frequently mentions names, dates, and places in ways that cause problems for traditional natural language processing (NLP) systems. We briefly illustrate how variation from grammatical norms is also common in other languages for genealogical text, though for this talk we focus on English. We discuss how this type of prose is typically preprocessed and tokenized, and then mention how our approach is implemented as the first stage in our integrated system. The result of our integrated approach, that of preprocessing raw genealogical text, is render it more amenable to subsequent linguistic-based treatment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

AN INTEGRATED FIS-QFD MODEL FOR EVALUATION OF INTERNET SERVICE PROVIDER

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

AN INTEGRATED FIS-QFD MODEL FOR EVALUATION OF INTERNET SERVICE PROVIDER

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Identification of a Set of Activities to Be Collectively Considered as an Integrated System for Registering Pharmaceutical Supplies in the Ministry of Health and Medical Education, Iran

Background and Aim: One of the complex processes in the Ministry of Health and Medical Education in Iran is the process of registering pharmaceutical supplies. Currently the registration process is a multi-stage process, resulting in parallel services, a waste of time and unnecessary expenses. Therefore, an integrated system will improve the relevant service delivery. The purpose of this study ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001